Today’s investigation. Sampling living things is a basic practice in biology; after all, we can’t measure every member in a population. Biologists are constrained by logistics, time and funding and thus rely on measuring samples. As the features characterizing a sample are not identical to those characterizing the whole population, measurements made from a sample are affected by who gets sampled and who does not. Even more challenging, measurements from samples also depend on who is sampling the individuals (us!). Today, we will explore some of the factors that give rise to sampling error and, consequently, decrease measurement precision and accuracy through digital image analysis of carnivore skulls from CSU Long Beach CNSM Vertebrate Collections.
Introduction
In this lab we will explore the difference between measurement accuracy and precision, while getting some hands on experience on digital image analysis. A fundamental task among biologists is to minimize both sampling error or the difference between our estimate and the true population parameter due to chance, and bias or the systematic discrepancy between our estimate and the true population parameter. For example, a group of estimates can have high accuracy and low precision, while others can be inaccurate but very precise. As you might guess, both high measurement accuracy and precision are important in order to be closer to the truth. So, how can we determine precision and accuracy in our measurements in order to avoid error and bias?
Today, we will measure the canine length of carnivores using linear distances on digital images to quantify measurement accuracy and precision (Fig. 1), and contrast between sampling error, bias, and objectivity in measurement recording. Digital image analysis provides means to increase precision (e.g., high resolution) but it can also be a source of bias (e.g., curvature of the visual field by the lens). However, not only this measurement process affects our estimates, we can also be a source of error and bias through poor scaling or other visual limitations. So, let’s first explore and quantify measurement accuracy and precision in our own estimates, then in the Take-home exercise we will analyze the entire group’s data to investigate how sample size affects accuracy and precision.
Figure 1. Coyote skull and upper right canine, CSU Long Beach CNSM Vertebrate Collection.
Upon completion of this lab, you should be able to:
External study resources
Worked example
To get started, let’s remind ourselves the definitions of accuracy and precision:
By definition, accuracy and precision are independent of each other but both affect the conclusions of any investigation. Say that you recorded nine measurements (orange dots in Fig. 2) ranging from 7.1 to 11.2 units and a mean value of 8.3 units. The difference between the minimum and maximum values (the range) is your precision, which in this case is 4.1 units. Thus, a measurement recorded as 8.3 units would imply the individual presents a value higher than 4.2 units but smaller than 12.4 units (8.3 ± 4.1 units). This is independent from your measurement accuracy. Here, the true value is 3 units (black dot in Fig. 2), which is 4.1 units different from the minimum value and 8.2 units different from the maximum value, and 5.3 units different from the mean value. So, with the available information, we can argue that the measurements are relatively precise but highly inaccurate (no single estimate is close to the true value).
Figure 2. Measurement accuracy and precision given its true value.
Materials and Methods
1. Digital image analysis
A. If you have not done so already, download Image J.
B. Open an excel spread sheet and name it: LastnameFirstname_Species.
Create the following header of 4 columns: “image_number” “specimen_id” “tooth_side” “length_cm”. The information for this header can be found in the image file name. For example: the name of the first image for the bob cat specimen is “1-BCat_L21_left”. Thus, the image_number is “1”, the specimen_id is “BCat_L21”, and the tooth_side is “left”. Length_cm is the canine length measurement you will make later.
Save this file and keep it open, you will use it to record all measurements.
C. Open ImageJ. You will repeat the following steps for each image.
D. Select an image for analysis. In the main menu bar, go to File -> Open. Select an image. There are a total of 6 images you should analyze. These images are duplicates and so you need to measure them in the order they are given (1 to 6) to avoid extra biases. Note that you can zoom in and out the image to desired view using standard shortcuts or clicking Image -> Zoom.
Set the image scale. To set the scale, drag the cursor from one hatch mark to the next to achieve 1cm (see image below).
Go to Analyze -> Set Scale. In “Known distance”, type “1” and in “Unit of length” type “cm”. Leave the rest as is. Hit “OK”. This step is very important, you are assigning a reference that the software will use to determine the magnitude of your measurements. You are now ready to measure!
E. Measure the canine length in cm in each image.
Drag the cursor to create a line from the tip of the tooth to the uppermost part attached to the base.
With the line drawn, go to Analyze -> Measure. A chart should appear with multiple measurements including “Length” (see figure below). Length is the only measure of interest for this exercise.
Go to your excel file and record each length measurement generated with each image. You should finish with a total of twelve measurements. Don’t forget to include the id and tooth you just measured!
Info-Box! Before any data analysis, there is a very important step; that of data management. Through the process of data management, we are able to arrange our dataset appropriately for analysis. Thus, there are generalizations on how to do good data management.
Rules for good spreadsheets:
2. Data analysis
Summarize the results of your image analysis for each canine and estimate measurement accuracy and precision. Include:
Although we just have a sample size of three (i.e., three images per canine), let’s practice how to get this simple summary in R. Note: the example below follows the suggested heading of the data in Part A.
A. Import your data to RStudio. For simplicity, let’s call our data object “canine”. Hint: check out our past exercise’s script!
# Importing data
canine <- read.csv("canine.csv",header=TRUE)
# viewing the data
canine
B. Summarize your data across teeth type. For that, we can use the function summary().
# summarizing the data
s <- summary(canine)
s
Questions
C. Summarize the data across teeth type, step-by-step. As any language, there are many different ways to “say” the same. That is, other functions exist to estimate the same data summary. However, they may need a bit more coding. For instances, to get the minimum value of a column in our data we can use the function min() but we need to explicitly indicate the column of interest with the $ sign (meaning “within”):
# minimum value
m1 <- min(canine$length_cm)
m1
# maximum value
m2 <- max(canine$length_cm)
m2
# mean value
m3 <- mean(canine$length_cm)
m3
# range between minimum and maximum values (precision)
r <- m2-m1
r
Assuming a true value of canine length, say TV (provided by the instructor), an approximation of the accuracy of our estimates can be calculated as the difference between the mean estimate and TV.
# difference between the mean and the true value (accuracy).
m3-TV
# using absolute value
abs(m3-TV)
However, we know that there is a lot of variation among different teeth. So, let’s now carry out the data summary for each tooth (left and right) to estimate the accuracy and precision of our measurements. This involves a bit of data management. Let’s use the package tidyverse to filter by tooth and estimate the summary. Hint: check out our past exercise’s script!
# loading the package
library(tidyverse)
# viewing the data
canine
# filtering by the left canine
lt <- filter(canine,tooth=="left")
lt
# summary
summary(lt)
# precision
max(lt$length_cm) - min(lt$length_cm)
# accuracy
abs(
mean(lt$length_cm)-TV
)
Questions
How precise were your measurements of the upper left canine length? Hint: When referring to precision, we refer to the spread of measurements resulting from sampling error. That is, how much variation was there in making the measurements and how much each measurement agrees with each other.
How accurate was your mean length measurement of the upper left canine? Hint: When referring to accuracy, we refer to the closeness of the measurements to the true value.
What factors/processes do you think could have influenced accuracy and precision? Hint: Think about the logistics involving digital image analysis and ImageJ, the potential biases and objectivity in measurement recording.
Stop, Think, Do: Now, it is your turn to estimate the descriptive summary for the upper right canine in your data and quantify accuracy and precision. Stop and review the codes you just ran. Think about how you could manipulate such codes in order to do the same analysis for the right canine. Hint: give an appropriate name to the new object you will create for the right canine. Such name should not overwrite the one for the left canine (“lt”). Do the analysis and be ready to present it!
Take-home exercise
Analyze a larger data set and explore whether sample size affects your conclusions on measurement accuracy and precision.
Task. Using the master data file of your assigned carnivore, which includes your own measurements and those of the rest of the students in the class, carry out the data summary for each of the two teeth, including estimates for measurement accuracy and precision. Generate a table comparing measurement accuracy and precision of your own analysis done in class with the analysis done with the master data file. Think about how best to describe such data in a table making sure it is well organized. Accompany the table with a concluding paragraph that summarizes your results (i.e., info in the table) and also discusses them. Your assignment should include three files; (1) a .doc file with the table and paragraph, (2) an .R file containing your annotated R script with all the steps (including steps for your individual analysis and the analysis of the master file), and the .csv file containing the data analyzed.
Guided questions for writing the concluding paragraph: